Corpus-based unit selection for natural-sounding speech synthesis
نویسنده
چکیده
Speech synthesis is an automatic encoding process carried out by machine through which symbolsconveying linguistic information are converted into an acoustic waveform. In the past decade orso, a recent trend toward a non-parametric, corpus-based approach has focused on using real hu-man speech as source material for producing novel natural-sounding speech. This work proposes acommunication-theoretic formulation in which unit selection is a noisy channel through which aninput sequence of symbols passes and an output sequence, possibly corrupted due to the coveragelimits of the corpus, emerges. The penalty of approximation is quantified by substitution and con-catenation costs which grade what unit contexts are interchangeable and where concatenations arenot perceivable. These costs are semi-automatically derived from data and are found to agree withacoustic-phonetic knowledge. The implementation is based on a finite-state transducer (FST) representation that has been success-fully used in speech and language processing applications including speech recognition. A proposedconstraint kernel topology connects all units in the corpus with associated substitution and con-catenation costs and enables an efficient Viterbi search that operates with low latency and scales tolarge corpora. An A∗ search can be applied in a second, rescoring pass to incorporate finer acousticmodelling. Extensions to this FST-based search include hierarchical and paralinguistic modelling.The search can also be used in an iterative feedback loop to record new utterances to enhance corpuscoverage. This speech synthesis framework has been deployed across various domains and languages in manyvoices, a testament to its flexibility and rapid prototyping capability. Experimental subjects com-pleting tasks in a given air travel planning scenario by interacting in real time with a spoken dialoguesystem over the telephone have found the system “easiest to understand” out of eight competing sys-tems. In more detailed listening evaluations, subjective opinions garnered from human participantsare found to be correlated with objective measures calculable by machine. Thesis Supervisor: James R. GlassTitle: Principal Research Scientist
منابع مشابه
Corpus Creation for Polish Unit Selection Speech Synthesis
This paper describes the process of creating speech corpus for Polish Unit Selection speech synthesis. This task is time-consuming and manually designing the corpus is, in practice, only applicable in Limited Domain Speech Synthesis and Recognition. The sentence selection tools used while designing the corpus are usually based on the Greedy algorithm. The algorithm looks for sentences which cov...
متن کاملConcatenative Speech Synthesis: A Review
The primary objective of this paper is to provide an overview of existing Concatenative Text-To-Speech synthesis techniques. Concatenative speech synthesis can be broadly categorized into three categories, Diphone Based, Corpus based and Hybrid. Diphone based speech synthesis relies on different signal processing techniques such as PSOLA, FD-PSOLA etc. These signal processing techniques introdu...
متن کاملA Corpus-Based Concatenative Speech Synthesis System for Turkish
Speech synthesis is the process of converting written text into machine-generated synthetic speech. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech units. Corpus-based methods use a large inventory to select the units to be concatenated. In this paper, we design and develop an intelligible and natural sounding corpus-based concatenative speech synthes...
متن کاملSynthesis and evaluation of conversational characteristics in speech synthesis
Conventional synthetic voices can synthesise neutral read aloud speech well. But, to make synthetic speech more suitable for a wider range of applications, the voices need to express more than just the word identity. We need to develop voices that can partake in a conversation and express, e.g. agreement, disagreement, hesitation, in a natural and believable manner. In speech synthesis there ar...
متن کاملCreation and analysis of a Polish speech database for use in unit selection synthesis
The main aim of this study is to describe the process of creating a speech database to be used in corpus based text-to-speech synthesis. To help achieve natural sounding speech synthesis, the database construction was aimed at rich phonetic and prosodic coverage based on variable length units (phoneme, diphone, triphone) from different phonetic and prosodic contexts. Following previous work on ...
متن کاملApplying pitch connection control in Mandarin speech synthesis
In this paper, a novel tone-based pitch connection control in unit selection is described to improve naturalness of output speech for Mandarin text-to-speech (TTS) baseline system. This study mainly focuses on pitch connections of concatenative syllables. To improve the concatenation quality, we apply offset pitch of preceding syllable and onset pitch of following syllable in unit selection. Ac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003